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Abstract 

Classical, parametric multinomial logit models are in general not sufficient for detect- 
ing the complex patterns voter profiles nowadays typically exhibit. In this manuscript, 
we consider a semiparametric multinomial logit model for analyzing the composition 
of electorates, and give a detailed analysis of a subsample of the German electorate 
in 2006. Germany is a particularly strong case for more flexible nonparametric ap- 
proaches, since due to the reunification and the preceding different political histories 
the composition of the electorate is very complex and nuanced. Our analysis reveals 
strong interactions of the covariates age and income, and highly nonlinear shapes of 
the factor impacts for each party's likelihood to be voted. Notably, we develop and 
provide a smoothed likelihood estimator for the suggested semiparametric multino- 
mial logit model, which can be applied also in other application fields, such as, e.g., 
marketing. 

Keywords: kernel regression; multiple choice models; profile likelihood; semipara- 
metric modelling; voter profiling. 



1 Introduction and Motivation 

The multinomial logit (MNL) model allows to investigate the influence of a vector of 
covariates on more than two possibly unordered outcomes of categorical response variables. 
MNL regression is in particular useful for analysing how socio-economic factors and other 
covariates affect an individual's likelihood of supporting various political parties. Such 
information is of great importance for policy makers and analysts, for example when it 
comes to designing of campaigns for targeted voter groups. Regression models can also 
be useful in forecasting election outcomes from opinion polls, and in particular from exit 
polls (Curtice et al., 2008, Fisher et al., 2011). However, we claim that conventional, 
parametric MNL models are in general not adequate for capturing the complex patterns 
typically exhibited by voter profiles. We thus consider a semiparametric extension of 
the MNL model, with estimation conducted via kernel smoothing and a profile likelihood 
algorithm. The modelling approach can be seen either as a more flexible multinomial 
regression approach (compared to pure parametric approaches), or as an explorative tool 
that may be used in order to find an appropriate parametric specification (in scenarios 
where this may be accomplishable). We apply our model to study political party affiliation 
in Germany and to investigate the electorate profiles for the different political parties. 

The MNL model has become popular in econometrics by the work on brand choice 
behaviour by McFadden (1974) and on urban travel demand by Domencich and McFadden 
(1975), respectively. Since then the model has been used in a wide field of applications, but 
still especially in studies of consumer behaviour. Different trials were undertaken to include 
nonlinear effects of the explanatory variables. Krishnamurthi and Raj (1988) used loga- 
rithmic transformations. Ben-Akiva and Lerman (1985) as well as Kalyanaram and Little 
(1994) proposed using piecewise linear (utility) functions on predetermined (sub) intervals. 
More recently, some authors developed nonparametric and semiparametric methods for 



regression models with multicategorical response. Yee and Wild (1996) considered a back- « 

fitting algorithm on a class of multivariate additive models using smoothing splines. Abe 42 

(1998, 1999) proposed a special class of generalized additive models which accommodates 43 

a multinomial qualitative response. His algorithm is based on a penalised likelihood func- 44 

tion and modified local scoring (Hastie and Tibshirani, 1986). Tutz and Scholz (2004) 45 

approximate unspecified additive functions by a finite number of basis functions which 46 

are penalised with respect to their localisation. Kneib et al. (2007) modified this using 47 

penalised B-splines and a Bayesian approach for their estimation. 48 

In this manuscript, we provide an alternative likelihood formulation that is localised via 49 

kernels. More specifically, we consider profile likelihood estimation in the spirit of Severini 50 

and Wong (1992). Our methods extend Miiller's work (2001), which in particular considers 51 

the estimation of semiparametric models with binary response variables, to the general case 52 

of multinomial responses. Statistical inference on the parameters can then be based on the 53 

marginal Fisher information. Inference on the nonparametric part can be conducted using 54 

a (semi-)parametric bootstrap along the lines of Hardle et al. (2004a). 55 

We apply the semiparametric MNL model to data on political party affiliation in Ger- 56 

many from 2006. We compare our results to corresponding results obtained from fitting a 57 

parametric MNL model, thereby demonstrating the superiority of the more flexible semi- ss 

parametric approach when analyzing this kind of data. The increase in flexibility offered 59 

by the semiparametric approach allows us to identify complex voter profiles with respect to eo 

age, income, gender and region, with high nonlinearity in the covariate effects and strong ei 

interaction between different covariates. While the former feature alone could relatively 62 

easily be captured also by generalized additive models, combined with the latter feature 63 

this data structure can most naturally be modelled by considering multidimensional non- 64 

parametric functions of covariates, motivating our kernel-based approach. 65 

Section [2] describes the model specification and the estimation procedure. In Section ee 
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[3] we introduce the data set on political party affiliation. In that section we also conduct 67 

a classical, parametric MNL analysis, which subsequently serves as a benchmark for the es 

results of the semiparametric approach. Section [4] is then dedicated to the detailed analysis 69 

of a semiparametric MNL model, fitted to the political party affiliation data using the 70 

proposed method. Conclusions are given in Section [5] 71 

2 The semiparametric multinomial logit model 72 

2.1 The model 73 

Consider a semiparametric multinomial logit model with K different outcome categories 74 

that have no natural order. The conditional probability of outcome Y = k, k — 1, . . . , K, 75 

given the individual covariate vectors X = (Xi, . . . , X p Y G W and T = . . . , T q Y G 76 

is assumed to be given by 77 



To ensure identifiability we set (3 K = 0, and vtlk = 0, i.e. K is the reference mode. 78 

Each rn&(-), k — 1, . . . , K, is assumed to be a smooth function with domain R g and each 79 

f3 k = • • • , PkpY, k = 1, . . . , K — 1, denotes an unknown parameter vector. Variables so 

that depend on both modes and individuals could be considered as well. Note that the si 

nonparametric functions also capture any mode-specific effect (see further discussion 82 

below). Thus, X must not contain mode-specific dummies. 83 

2.2 Estimation 84 

If the functions rrik(-) were known it would be easy to find estimators for the vectors f3 k , 85 

and vice verse. Following the ideas of profiled likelihood by Severini and Wong (1992), the se 
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functions m k {-) are regarded as nuisance when estimating the finite-dimensional parameters 87 

j3 k . The functions m k {-) themselves can be estimated via kernel smoothing. Note that the as 

estimate of m k ^, (■) will depend on all /3 ■, j = 1, . . . , K — 1, indicated by the index 89 

This yields asymptotically normal, i/n-consistent and efficient estimators for the vectors 90 

f3 k owing to likelihood estimation. They are unbiased (bias is of order o(l/\/n)) and the 91 

variance is obtained from the inverse of the Fisher information. For the functions m k , 92 

k = 1, . . . , K — 1, one obtains consistent estimators with statistical properties typical for 93 

nonparametric kernel smoothing, and inference can be done via bootstrap (cf. Rodriguez- 94 

Poo et al., 2003). 95 

In order to estimate the so-called least favourable curve m kj p, (t) at point t := (ti, . . . , t q ) 96 

for given j = 1 , . . . , K — 1 , we use a g-dimensional kernel /C : — > R, a bandwidth 97 

matrix H G K,^ 9 , and consider the local likelihood 98 



Csimw. (t)) = ]T(det Hr^H-^t - t,)) £(77,(777^. (t)), Vi ), (2) 

i=l 

with riiirrikfi. (t)) := (771,, . . . , rjki(m k ,p. (t)), . . . , 77^), 
where r] ki (m k ^. (t)) := x^/3 fc + m fcj/3 . (t) 
and 77ji := x'/^- + m Ji/3 . (tj) for j ^ k, 



with row vector x* = (xa, . . . , x ip ), tj = . . . , t iq ). Here £(77^(771^. (t)), denotes the 99 

log-likelihood of ([!]) of the zth observation with predictor ^(m*^. (t)) wherein (3 1 , . . . , /3r-_i 100 

and rrijfi, (tj) for j ^ k are treated as fixed, such that rj i is a function only of m k ^ w in (|2j). 101 

With an estimate for m kt p, (•) at hand, we can compute the profile log-likelihood 102 



C p ((3 k ) = Y, £ ( r li(Pk),yi), (3) 

i=l 

where now rii(fl k ) := (r) H , . . . ,r} ki ((3 k ), . . . ,r) Ki ), 
with J7«(/3 fc ) := x*/3 fc + m fci/3 (t<) 
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and rjji, j 7^ k, as before. Note that TJi(-) is a function of (3 k in ([3]). 

For the estimation procedure we further need the first two derivatives of li(rj) := C(rj, yi) 
with respect to r] k , rjk = x*/3 fc + m&(ti). We have 

K K 

l i iv) = I {y 1 =k}Vki ~ log ex P (Vji) , ( 4 ) 
fe=i j=i 

where I is the indicator function. It then follows immediately that 

exp(rj ki ) 



407) = J {^=fc} 



exp(r/ fci ) • E 7 =i ex P(^) - ex P(^ 



and i ifr (ri) = ^7 . 

To obtain the maximum of the smoothed likelihood C s (mk,/3. (t)), successively from mode 107 
1 to mode i^, we have to solve the first order condition ios 

J](detH)- 1 x:(H- 1 (t-t i ))4(^K ) /3.(t))) = o (5) 

with respect to m^p. (t). For /3 fe the equation system to solve is 109 

^4(r 7i (/3 fc ))(x i + m' fci/3 .(t i )) = 0, (6) 

i=i 

wherein m' k a (tj) denotes the gradient of m^. (tj) with respect to (3 k . By deriving equation no 
(J5j) with respect to (3 k , one obtains m 

mfcAU " ElLiCdet^-^fH-Ht-t,))^^^.^))) • U 

Equations Q to Q can now be used to implement a Newton-Raphson-type algorithm, 112 
involving the following four steps: 113 
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1. Find appropriate starting values m^\-), k = 1, . . . ,K — 1 (e.g. by fitting an iw 
appropriate parametric MNL model) and set j = 0. us 

2. For k = 1,2, . . . ,K — 1, compute ne 

n 

/# +1 > = /3f-r^4(r,,(/3f))(x, + m ;j.(t 1 )) 

8=1 

n 

with B = J2 1'tMP^Mxi + mg. (t i ))(x, + mg. (t,))* 



and mg (t») as m ('?). 
5. For k = 1,2, . . . , K — 1, compute 

a) E^iCdetHJ-^CH-Ht-t^/^C^Cm^Ct))) 

fl (t) = ?7lL a t TTT 

A *** Er=i(detH)-i/C(H-Ht-t t ))/f fc (r7,(ma.(t))) 

/or a// points t at which the function m^pX") ^ s t° be estimated. 
4- Repeat steps 2.-3. for j = 1,2, . . . until convergence. 

It is convenient to estimate the functions rrik,/3. (■) in step 3 at the observation points tj, 
i — 1, . . . , n, as this guarantees that independent of the bandwidth choice at least for one 
observation /C(H _1 (t — tj)) is nonzero. For different alternatives of implementation see 
for example Chapter 7 in Hardle et al. (2004b). We implemented different modifications 
of the algorithm, obtaining basically the same results (results not shown). Mode-specific 
intercepts are not explicitly incorporated into our semiparametric model, since the vertical 
location (not the shape) of the functions accounts for it. They describe the unexplained 
heterogeneity over modes. If one is interested in including mode-specific intercepts (which 
have to vary also over individuals to be distinguishable from the functions m&), one can 
proceed as before but with coefficients and functions that are fixed over the K modes. In 
other words, we here derived the estimation algorithm for the more complex model. In our 



application the available information on the different political parties does not vary over 131 
individuals. 132 

If the dimension q of variable T increases, such that the curse of dimensionality becomes 
an issue (both in terms of quality of the estimation and with respect to interpretation of 
the nonparametric part), then further structure on the mode functions rafc(T) is required. 
The most popular such is additive separability, i.e. modelling 

m k (T) = a k + y ^2 i m k ,i{T l ), k = l,...,K. 

1=1 

There are at least two straightforward, though computationally tedious extensions think- 133 

able: backfitting or marginal integration. In the profiled likelihood context with marginal 134 

integration, one could adapt the procedure of Haxdle et al. (2004a). For backfitting, prob- 135 

ably the most efficient version is the smooth backfitting for generalized structured models 136 

(Roca-Pardinas and Sperlich, 2010), which would have to be adapted to multinomial re- 137 

sponses. A spline version for additive regression of multicatigorial data has been proposed 13s 

by Tutz and Scholz (2004). 139 



3 Party affiliation data and prior parametric approach mo 
3.1 SOEP data 

Our aim is to identify typical voter groups of the dominant political parties in the multi- 142 

party system of Germany. In order to make subsequent interpretations accessible to a 143 

broad readership, we begin by briefly sketching the historical background of the German 144 

party system. Nowadays, the German party system comprises five main political parties: 145 

the Christian Democratic Union respectively its Bavarian counterpart (Christian Social 146 

Union) which form one liberal-conservative fraction in the Parliament (CU henceforth), the 147 
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Social Democratic Party (SPD), the liberal Free Democratic Party (FDP), the democratic us 

socialist Left Party (LP) and the green party Alliance '90/The Greens (A90G), and it is 149 

these five parties that we will focus on in the subsequent analysis. 150 

From the establishment of the German Bundestag (i.e., the German parliament) in 151 

1949 until the German reunification in 1990, Western Germany was governed either by the 152 

CU or the SPD, with absolute majority or in a coalition with the FDP, respectively. In the 153 

seventies diverse groups of alternative green activists contested at various local elections, 154 

and in 1980 a green organisation was founded at the federal level. It comprised distinct 155 

groups such as the anti- nuclear movement, the student movement, feminist groups and the we 

peace movement (Losche, 1993). They won the first seats in the German Bundestag in 1983, 157 

and from 1998 to 2005 they formed the government in a coalition with the SPD. In the East 15s 

German dictatorship prior to the reunification, the Socialist Unity Party (SED) had sole 159 

political power, although small and well-controlled Christian and liberal parties co-existed ieo 

to give the system a semblance of legitimacy. After reunification the so-called PDS was iei 

founded as the heir of the SED. In 2005, the PDS entered an alliance with the then founded 162 

Western German party "Labour and Social Justice - Electoral Alternative (WASG)". Since 153 

2007 the alliance is simply called "The Left" (LP throughout this manuscript). It achieved 164 

8.7% in the 2005 election to the German Bundestag (26% in East Germany). 165 

The data for the subsequent analysis was extracted from the German Socio-Economic 166 

Panel (SOEP) of the year 2006. The variable of interest is political party affiliation, i.e., i6? 

the answer to the question "Toward which party do you lean?". As is commonly done in ies 

the literature, the socio-economic factors that are considered are age, log-income (i.e., the 159 

logarithm of the monthly net total household income), region of origin, and gender (cf. 170 

Quinn et al. 1999, Dow and Enderby, 2004). We decided not to include the covariates 171 

education and religion, which are sometimes considered in this type of analyses, for the 172 

following reasons. First, the reported years of education as well as vocational qualifications 173 
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are hardly comparable between Eastern and Western Germany. Second, this also applies 174 

to the reported religion: while in Western Germany the majority of the people officially 175 

still belong to either the protestant or the Roman catholic church, in Eastern Germany ne 

Christians are a minority. More than to anything else, this difference is related to the 177 

socialist history: the affiliation to a church could easily entail serious negative consequences n% 

for the family. The aim of this paper, however, is not to give a detailed investigation of 179 

the differences between voter profiles in Western and Eastern Germany, but rather to iso 

illustrate the usefulness of the proposed modelling approach in voter profiling in general, isi 

and to draw a detailed picture of the combined Eastern and Western Germany electorate. 182 

We do however acknowledge the different histories of Western and Eastern Germany in our 133 

approach by including a dummy variable 'East' in the model, indicating whether or not a 184 

person was resided in Eastern Germany before reunification. This way we also account for iss 

several of the issues discussed above. ise 

In total, 8787 individuals reported their party affiliation in the original data set. From 187 

this data set we excluded 376 (4.3%) individuals who favoured a party different from the iss 

five main parties that we focus on, and 426 (4.8%) individuals who made an implausible isg 

or no declaration about their income. In the remaining subsample, 227 persons who lived 190 

abroad before reunification, or did not report their regional provenience, were assigned to 191 

Western Germany (the results remained virtually identical when we excluded them). The 192 

descriptive statistics of the considered sample are summarized in Tables [T] and [2} The 193 

income is strongly skewed to the right, whereas age is slightly skewed to the left. The 194 

affiliation to the large parties seems to be overrepresented, and the one to the smaller 195 

parties underrepresented, compared to election results. One explanation for this may be we 

that smaller parties, such as in particular the FDP, tend to have a higher share of voters 197 

in the group of people with no party affiliation, since these individuals for example may 19s 

vote along tactical considerations. 199 
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Table 1: Descriptive statistics for the considered covariates (with upper half of the table 
considering the covariates that parametrically enter into the model, and lower half of the 
table considering the covariates that nonparametrically enter into the model). 



Variable 




Yes - 1 


No - 


Yes (in %) 


NO (in %) 


X\ Gender 


(1 if female) 


3 922 


4 063 


49.12 


50.88 


X2 Region 


(1 if East) 


1718 


6 267 


21.52 


78.48 


Variable 




Min 


Max 


Mean 


Median 


T\ Income 


(Euro/Month) 


400 


30,000 


3,089 


2,600 


T 2 Age 


(Years) 


21 


97 


53.8 


54.0 



Table 2: Percentages of reported political affiliation. 

Party CU SPD A90G LP FDP 

Affiliation (in %) 42.17 37.86 8.87 6.35 4.74 

There are two substantially different ways to analyse voter groups: 1) using purely descrip- 200 

tive statistics based on public-opinion polls, as routinely published by market research in- 201 

stitutes such as Infratest dimap or Forsa in Germany, and 2) employing inferential statistics 202 

by fitting adequate models. When it comes to voter profiling, one of the main drawbacks 203 

of 1) is that the distribution of the voter's choice is not quantified based on statistical 204 

laws and hence can not be used to support inferential statements about the population. 205 

Furthermore, such analyses typically focus on only one or two covariates at a time. Models 206 

such as the MNL model attempt to overcome these deficiencies by modelling the voter's 207 

party affiliation as outcome of a distribution that depends on a number of covariates. How- 20s 

ever, the MNL model and similar parametric models also have limitations: they are based 209 

on assumptions concerning the specific functional form that links the covariates to the out- 210 

come. Another limitation is the commonly assumed additive separability and the implied 211 

neglect of possible interactions between different covariates. The proposed semiparametric 212 

model for multicategorical data attempts to overcome these deficiencies. These aspects 213 

will be studied in detail in the course of the subsequent sections. 214 
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3.2 Results of fitting a parametric MNL model 

We initially consider a fully parametric MNL model. One purpose of doing so is to point 
out its deficiencies compared to the semiparametric model that we propose. Also it will 
later serve as a benchmark when interpreting the results from fitting a semiparametric 
MNL model. Finally, it allows us to easily test the assumption of irrelevant alternatives. 
For the parametric MNL model, the estimated coefficients of the log-odds are given in 
Table El 

Table 3: Parameter estimates for a parametric MNL model with CU as the reference 
category (standard errors in parentheses; ** indicates significance at the 1%, * at the 5%, 
and • at the 10% level). 

Mode effect Female East log(Income) Age/10 

SPD 3.661(0.382)** -0.013(0.051) -0.088(0.066) -0.385(0.045)** -0.013(0.002)** 

A90G 0.009(0.617) 0.307(0.085)** -0.338(0.121)* 0.071(0.074) -0.044(0.003)** 

LP 1.811(0.808)* -0.253(0.103)* 2.641(0.117)** -0.569(0.097)** -0.007(0.003)* 

FDP -4.762(0.827)** -0.567(0.114)** 0.235(0.141)* 0.496(0.097)** -0.023(0.004)** 



As reference category we used the largest party, i.e., the CU. Relatively to the reference 222 

mode, being from the East substantially raises the likelihood of being affiliated to the 223 

LP. The CU is known to have strong support from older individuals, and it is thus not 224 

surprising that the impact of age is found to be significantly negative for all other parties. 225 

The green party, A90G, has particularly strong support in the group of female voters, and 226 

is not that strongly represented in Eastern Germany. Indeed, according to Walter (2008), 227 

being a young and female Western German resident is the typical characterisation of an 22s 

A90G-voter. On average, presence of high income decreases the likelihood of supporting 229 

the LP and the SPD, while it significantly increases that of supporting the FDP. 230 

A possible criticism here is to not have used a nested (or say, two-level) MNL model, or a 231 

multinominal probit model (MNP model), to avoid the assumption of irrelevant alternatives 232 

(IAA). The obvious argument against the IAA is that voters might first choose between 233 
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either left or conservative parties, and then in a second step decide amongst parties within 234 

these groups. An argument in favour of the IAA is that for the present German party 235 

system it is no longer that obvious to the voters (as it perhaps was in the past) what 236 

exactly a right-left classification implies. Consequently, it is for example by no means clear 237 

whether a former voter of the CU switches to the SPD, A90G or the FDP. Application of 23s 

the computationally quite complex MNP model, for which it is not clear to us how this can 239 

be extended and implemented semiparametrically, thus does not seem appropriate in the 240 

given application. Likewise we believe it to be difficult to apply a nested MNL model, for 241 

which it is essential to correctly predefine adequate subsets of parties and their correlation 242 

structure. In any case one can test the IAA using the Hausman-McFadden test (1984) or 243 

using the Small-Hsiao test (1985), at least in the parametric world. We did this for all 244 

permutations of the reference category, and repeated the tests including also the squares 245 

of log-income and age to see whether the tests reject the IAA for more flexible models. In 246 

all cases we obtained p-values above 95%. This confirms that IAA is not problematic for 247 

the political party affiliation in Germany, say since 2000, which is an interesting finding in 24s 

its own right. 249 



4 Semiparametric analysis of voter profiles 

Since the turn of the century the influence of the big parties is declining, and the number 
of floating voters is increasing. In general, the identification with the different political 
parties has significantly decreased (cf. Alemann, 2003). This development calls for a more 
detailed analysis of the different voter profiles than the purely parametric MNL can offer. 
We thus consider a more flexible model, namely the semiparametric MNL model that has 
been introduced in Section [2j The details of its implementation in the given application 
are as follows. The two dummies corresponding to gender and region enter parametrically, 
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while the other two covariates, age and log-income, go to the nonparametric part: 



P(y = k | Sex, East, Inc, Age) 

exp ({3i, k ■ Sex + f3 2 ,k • East + m fc (Inc, Age)) 
Y!]=i ex P (Pij ■ Sex + 2 ,j ■ East + m 3 -(Inc, Age)) 

The largest party (CU) was used as reference category (i.e., (3i^ = 02,5 = ms — 251 

with index j — 5 referring to the CU). The smoothing parameter for the nonparametric 252 

part was chosen on a grid of bandwidths h from 0.4 to 1 times the standard deviation of 253 

age and log-income, respectively. For the presentation of the results, we will show results 254 

obtained using h = 0.5 times the standard deviations of these covariates. The estimated 255 

parametric effects of the dummies gender and region are listed in Table |4j They are similar 256 

to those obtained for the parametric MNL model, as given in Table [3j but have increased 257 

significance. 25s 

Table 4: Estimated coefficients with standard deviations in brackets for the parametric 
part of the semiparametric MNL model with CU as the reference category. ** indicates 
significance at the 1%, * at the 5%, and • at the 10% level. 

Mode SPD A90G LP FDP 

"Sex 0.006(0.047) 0.334(0.081)** -0.217(0.100)* -0.561(0.114)** 
East -0.107(0.059)* -0.289(0.115)* 2.63(0.113)** 0.251(0.139)* 



The Figures [T] and [2] display the probabilities of supporting the different parties as 259 

a function of age and log-income, for a woman from Western Germany. We used the R- 260 

packages rgl (Adler and Murdoch, 2009) and akima (based on Akima, 1978) to display the 261 

estimated bivariate functions. The use of akima generates a smooth surface by bivariate 262 

interpolation of irregularly spaced input data. The black points at the bottom of the plots 253 

indicate the observations. The surfaces have been rotated such that the main features can 264 

most easily be recognised. It should be stressed here that for other values of the dummies 265 

for gender and region the surfaces change (as the probability function is not linear), but 266 
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only in the sense that some slopes become flatter or steeper; the general patterns remain 



267 



similar. 268 




Figure 1: Conditional probabilities for women in Western Germany of being a supporter 
of the different parties (top left: CU; top right: SPD; bottom left: A90G; bottom right: 
FDP). The axis running from 6 to 10 refers to log-income, the one from 20 to 97 refers to 
age, and the ones restricted to subsets of the interval [0, 1] refer to the probability. 

For each parties the signs of the slopes change frequently with both increasing age and 269 

increasing log-income values. It is obvious that both nonlinearities and interactions play 270 

an important role for the given voter profiles. Clearly it would be very difficult to find 271 

adequate parametric models that can appropriately reflect both these important features 272 

of the data. Furthermore, even a nonparametric additive decomposition could easily lead to 273 

wrong conclusions, since certain characterisations of voter groups could not be well captured 274 

by such a model, due to its averaging over all ages to derive the influence of the log-income 275 
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(and vice versa). For example, the valley observed for SPD at small log-incomes and young 2?e 

ages would be concealed in such an analysis, and likewise, at least to some extent, would 277 

be the distinct peak observed for LP supporters aged around 50 with very low income. 27s 

This finding is important, since it implies that most of the standard techniques that are 279 

usually applied to this type of political party affiliation data are insufficient for conducting 280 

adequate statistical inference, and can potentially conceal important data structure. 281 

Over all age groups, the support of the SPD is strongest in the low income group, and 282 

that of the CU and the FDP is strongest in the high income group. The highest support 283 

for A90G is found for young voters, even though less so for the few young voters that have 284 

a high income. For A90G and the CU the strongest factor influencing the probability of 285 

being affiliated to the party is age (not income). For the LP, the main driving factor is the 286 

region. 287 

Considering the surface of probabilities for supporting the CU, we note that the general 288 

upward trend in age is heterogeneous over income: for incomes > 3, 000€ (8 on the log 239 

scale), the trend is not strictly increasing over all ages, as it roughly is for lower incomes. 290 

The upward trend in income is interrupted for incomes between 750€ to 3,000€, income 291 

levels that constitute the major part of the population. Both trends are nicely contrasted 292 

in the results obtained for the SPD. For the SPD, the resulting surface of probabilities 293 

corroborates, to some extent, that the SPD is the party of the working class across all 294 

ages. For incomes higher than 3, 000€ the likelihood drops away, but there is a strong 295 

support from the middle class with an income of around 3, 000€. 296 

The stylized facts for A90G discussed above are found confirmed, but with some in- 297 

teresting nuances. The typical voter of the green party is generally believed to be either 29s 

young, with low income (mostly students), or middle-aged and relatively well-off. Young 299 

individuals with low income indeed constitute a particularly strong voter block. The ex- 300 

pected prominent role of the upper middle class is also found. However, for ages > 50 the 301 
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latter feature diminishes. This is due to a strong support from individuals with academic 
background, typically coming from the seventies' student movements, the eighties' anti- 
nuclear and peace movement in Western Germany, and the peaceful revolution in Eastern 
Germany. 

The FDP has the smallest voter basis, such that the estimated probability surface in 
this case is more wiggly and should not be over-interpreted. However, it can be recognized 
that while age has no clear impact, a high income is most likely to render somebody an 
FDP supporter. Indeed, the impact of log-income on the probability of being affiliated to 
the FDP seems to be exponentially increasing. 















■■■7 : ' 







Figure 2: Conditional probabilities for men from Eastern (left plot) and from Western 
Germany (right plot) of being a supporter of the LP. 



In order to illustrate the effect of the dummy variables, Figure [2] displays the probabili- 
ties of being affiliated to the left party (LP), for voters from Eastern Germany versus voters 
from Western Germany. We chose the same scales to emphasise how big the difference is. 
For Eastern Germany we recognize strong support in the group of individuals with low 
income aged between 40 and 60, whereas high income (above 6, 000€) on average leads to 
very small affiliation with the LP (which is to be expected for a left-wing party). We also 
find a notable support of the LP in the upper middle class for individuals older than 50. 
Exactly here one will typically find those who were quite involved in the SED dictatorship, 
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individuals who may have benefitted from the regime. These specific characterisations of 
voter groups could not have been captured by the simpler modelling approaches that we 
discussed, since those average over all scales. 
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5 Conclusions 322 

We discussed a semiparametric MNL model and its utility when estimating political party 323 

affiliation. Our approach extends the generalized partial linear model (GPLM) that, in 324 

the case of binary response variables, has been discussed by Miiller (2001). Our methods 325 

are in compliance with the GPLM framework, such that the mathematical properties of 326 

asymptotic normality, consistency and efficiency of the estimators remain valid. Our model 327 

is directly applicable in other applications that are concerned with similar types of data, 32s 

such as, e.g., in brand choice in marketing studies or choice of transportation modes. 329 

Important features of the semiparametric MNL model are that it does not assume a 330 

specific functional form for the influence of a subset of the explanatory variables, and that 331 

it allows for interactions between covariates. The model can capture binary, discrete and 332 

continuous variables, with the latter being most suitable to be modelled nonparametrically. 333 

It is possible to consider a simple linear structure, or to consider one-dimensional additive 334 

as well as bivariate or multivariate nonparametric components. 335 

The flexibility of the model leads to comprehensive insights into the profiles of specific 336 

voter groups, insights that other popular modelling approaches cannot offer due to inher- 337 

ent simplistic assumptions on the relation between response and explanatory variables. 33s 

More specifically, we found that our model overcomes many potential deficiencies of both 339 

parametric modelling and nonparametric modelling assuming an additive separability. The 340 

strong nonlinearities that we found show how complex voter groups nowadays are struc- 341 

tured, and underline the need for more sophisticated approaches than parametric MNL 342 
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modelling, since the complicated nonlinearities that we found would be extremely hard to 343 

capture adequately in a parametric framework. Additive modelling, which can be done ei- 344 

ther by marginal integration or by backfitting, arguably can lead to better interpretability, 345 

and is designed to allow for nonlinear covariate effects. However, in the present application 346 

with the given strong interactions between the covariates age and income, such a decom- 347 

position would still entail a considerable model misspecification and would thus conceal 348 

important structures in the voter profiles. For a detailed discussion on nonparametric ad- 349 

ditive modelling with and without interaction between covariates we refer to Sperlich et 350 

al. (2002). 351 
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